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UNIFORM MOMENT BOUNDS OF FISHER'S INFORMATION 
WITH APPLICATIONS TO TIME SERIES 

By Ngai Hang Chan 1 and Ching-Kang Ing 2 

Chinese University of Hong Kong and Academia Sinica 

In this paper, a uniform (over some parameter space) moment 
bound for the inverse of Fisher's information matrix is established. 
This result is then applied to develop moment bounds for the nor- 
malized least squares estimate in (nonlinear) stochastic regression 
models. The usefulness of these results is illustrated using time series 
models. In particular, an asymptotic expression for the mean squared 
prediction error of the least squares predictor in autoregressive mov- 
ing average models is obtained. This asymptotic expression provides 
a solid theoretical foundation for some model selection criteria. 

1. Introduction. Moment inequalities and moment bounds have long 
been vibrant topics in modern probability and statistics. The celebrated 
inequalities of Burkholder [3] and Doob [5] offer exemplary illustrations of 
the importance of moment inequalities. Using moment bounds, the order 
of magnitude of the spectral norm of the inverse of the Fisher's informa- 
tion matrix can be quantified and consistency and efficiency of least squares 
estimates of stochastic regression and adaptive control can be established; 
see, for example, the seminal work of Lai and Wei [15] and the succinct 
review of Lai and Ying [16]. In this paper, a uniform (over some parameter 
space) moment bound for the inverse of the Fisher's information matrix is 
established. This bound is used to investigate the moment properties of least 
squares estimates and the mean squared prediction error (MSPE) for time 
series models. 
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To appreciate the significance of uniform moment bounds, consider the 
stochastic regression model 

(1-1) Vt = 9t{0o) + e t , t = l,...,n, 

where gt(-) is a random function, 9q is an unknown parameter and {et} is 
a martingale difference sequence. There are two important problems related 
to this model. 

The first one concerns the mean squared error prediction. In practice, the 
unknown parameter Oq is usually estimated by the least squares estimate 
6 n , which minimizes Sn(0) = Et=i(Vt - 9t{9)) 2 - Although the (strong) law 
of large numbers (LLN) and the central limit theorem (CLT) of 9 n were 
established under certain assumptions on gt(-) and e% (see among others, 
Lai [14] and Skouras [19]), relatively little is known about the moment con- 
vergence of 9 n . Moment convergence of 9 n offers important insight in the 
pursuit of the mean squared prediction problem. To see this, suppose that 
n l l 2 (6 n — Oq) is asymptotically normal with mean zero and variance rj > 0. 
Then an immediate question is to pursue 

(1.2) nn l ' 2 {e n -e^ = o{i), q>l. 

In particular, if (1.2) holds for some q > 2, then {n(9 n — 9q) 2 } is uniformly 
integrable and consequently, lim n _j. 0O nE(9 n — 9q) 2 = r\. This result can be 
applied to develop an asymptotic expression for the mean squared error of 
9 n as 

E0 n -0 o ) 2 = l + o(n- 1 ) 
n 

from which asymptotic properties of the MSPE of the least squares predictor 
) of y n+ i, E(y n+ i - g n+ i(9 n )) , can be established; see Sections 2 
and 3 for further details. 

To establish (1-2), consider the Fisher's information number, 
n- 1 J2t=i(9't(9)) 2 of (1.1), where g' t {9) = dg t (9)/d6. As will be shown in Sec- 
tion 2, it turns out that the uniform negative moment bound for 
n l J2 r t=i(9t(9)) 2 , that is, for any q > 1, 

(1.3) e( sup (n-^iam 2 ) }=0(1) 

plays a crucial role in proving (1.2), where Bs 1 (9q) = {9: \8 — 9q\ < 5±} for 
some Si > 0. 

A second but equally important problem in stochastic regression concerns 
model selection. To understand how the uniform moment bound is related 
to this issue, consider the case when gt(-) in (1.1) contains k > 1 unknown 
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parameters 6q G R k . A multiparameter generalization of (1.3) becomes: for 
any q>l, 



where A m i n (L) denotes the minimum eigenvalue of the matrix L and X7gt(0) 
denotes the gradient vector of gt(0)- In particular, when gt{9) = gt{0i, ■ ■ • , 

Ok) = GiVt-i H \-0kUt~k in (1-1), that is, when y t is an autoregressive (AR) 

model of order k, (1.4) reduces to 



where yt(k) = (yt, ■ ■ ■ ,yt-k+i) T ■ By imposing a Lipschitz type condition on 
the distribution function of Et and a stationarity condition on gt(-), Findley 
and Wei [7] established (1.5), thereby providing a rigorous mathematical 
derivation of the AIC model selection criterion for weakly stationary AR 
processes. However, the proof of (1.4) for a general stochastic regression 
model is much more involved than (1.5) due to the presence of an "extra" 
supremum, which is taken over an uncountable set inside the expectation. 
As a consequence, similar to the AR case, knowledge about negative uni- 
form moment bounds of the Fisher's information matrix (1.4) constitutes an 
indispensable tool for the model selection problem. 

The rest of this paper is organized as follows. In Section 2, we first show in 
Theorem 2.1 that (1.4) holds under more general situations where Bs 1 (0o) 
is replaced by a bounded subset of R k and Vgt(0) is replaced by a vector- 
valued random function ft(0),0 E 0, satisfying certain assumptions. We 
then apply Theorem 2.1 to establish the moment convergence of least squares 
estimates in (nonlinear) stochastic regression models; see Theorem 2.2. Sec- 
tion 3 focuses on the applications of Theorems 2.1 and 2.2 to autoregressive 
moving average (ARMA) models. In particular, the moment convergence of 
the least squares estimates and an asymptotic expression (up to terms of 
order n~ 1 ) for the MSPE of the least squares predictor for ARMA models 
are established. To facilitate the presentation, technical results of Sections 2 
and 3 are deferred to Appendices A and B, respectively. 

2. Uniform bounds on negative moments. Let (£l,J-, P) be a probabil- 
ity space and {J^t} be an increasing sequence of u-fields on (SI, ^F, P). Let 
ft(0),t= l,...,n, be r-dimensional J-f-measurable random functions of a 
parameter vector = (9\, . . . , #fc) T G C R k . In the first half this section, 
we provide sufficient conditions under which the minimum eigenvalue of the 



(1.4) 




sup 

9eB 6l (0 O ) 




(1.5) 
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normalized matrix n 1 J2t=i ^t{d)^ {9), A m i n (n 1 Ylt=l ft(^)ft T (^)) , satisfies 
the following uniform moment bound: 

(2.1) EjsupAj^n-^fi^fT^^Oa) for any q > 1. 

This uniform negative moment bound is applied to investigate the moment 
properties of least squares estimates in the second half of this section. To 
begin, assume the following conditions: 

(CI) fi(0) is continuous on and is a bounded subset of R k ; 
(C2) there exist positive integer d and positive numbers S, a and M such 
that for any t>d, any < s 2 — si < 5, any 9 £ @ and any ||a|| = 1, 

P(si < a T f t (6>) < s 2 \ Ft-d) < M(s 2 - Sl ) a a.s., 

where ||a|| denotes the Euclidean norm of vector a € R r \ 

(C3) there exist r > and nonnegative random variables Bt satisfying 
sup 4>1 E(i?t) < C\ for some C\ > such that for all £1,^2 £ © with — 
Call <t. 

WUZd-U^W^ZtMt-ZA a.s.; 
(C4) there exists C 2 > such that sup t>1 E(sup 0ge ||f t (0)|| 2 ) < C 2 - 

(CI) is a standard assumption for the regression function and its gradi- 
ent vector in nonlinear regression; see, for example, Lai [14] and Robinson 
and Hidalgo [17]. (C2) says that given the information (cr-field) whose time 
index is sufficiently smaller than the current time index t, the conditional 
distribution of a T f((0) follows a local Lipschitz condition of order a for all 
points 0£0 and all directions a with ||a|| = 1. In the special case when 
contains only one point, (C2) is related to Findley and Wei's [7] uniform 
Lipschitz condition over all directions, which is the key assumption used in 
deriving the AIC for stationary AR models. Since we need to deal with the 
supremum over a class of inverses of minimum eigenvalues indexed by 6, a 
Lipschitz type condition over all points (9) in all directions (a) is required 
in this paper. As will be seen in Section 3, (C2) is flexible enough to encom- 
pass many time series applications. Conditions like (C3) have been imposed 
on the regression function by Andrews [2] and Skouras [19] in proving the 
uniform law of large numbers for random functions associated with S n (9). 
(C3) can be verified when ft(#) is sufficiently smooth; see (3.26) for more 
details. (C4) imposes a mild moment condition on ft(9) and appears to be 
satisfied in many practical situations. Moreover, (C4) can be weakened to 
sup t>1 sup0 G@ E(||f t (#)|| 2 ) < C 2 for some C 2 > at the price of strengthening 
the conditions on B t in (C3) to sup t>1 E(B 4 2 ) < C\ for some C\ > 0. 

Theorem 2.1. Assume that (C1)-(C4) hold. Then inequality (2.1) is 
true. 
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Proof. First, note that the measurability of sup 0g @ A~? n (n _1 Y^t=i f* W x 
f t T (0)) is ensured by the continuity of ft{0)- Define = [(n — d)/d\, where 
\a\ is the largest integer <a. Then for n large, 

r d /n d -l \ ^ -9 

(2.2) <n^A min ^ W +i (0)f ( T +1)d+i (0)J | 

d /nd-l \ 

j=l V i=0 / 

where the first inequality is ensured by the fact that for symmetric matrices 
Ei and E 2 , A min (£'i + £ 2 ) > A min (£'i) + A min (£' 2 ), and the second one is 
ensured by the convexity of x~ g ,x > 0. As a key step for achieving (2.1), 
we show, by making use of (C2)-(C4), in Appendix A that there exists a 
positive integer m, depending only on q,r,k and a, such that for all large 
n, all < / < rid — m an d all 1 < j < d, 

(2-3) E ^P A -in( E %+l)^( ) f (T+l)d + iWjj <C3, 

where C3 is some positive constant independent of / and j. Let n^ m = 
Ynd/m\. Then, analogous to (2.2), 

n d X iJm ( E { (i+l)d+j( e )f(i+l)d+j( e ) J 

«-d, m -l /m—1 \ 

< Ki/n d/m ) 9 ni Aj n l ^f ( i +sm+lM+j (6>)fJ +sm+1)d+J .(0)J. 

s=0 V i=0 / 

Combining this fact with (2.2) and (2.3) yields for n large and for some 
positive number C4, 



E { SU P ^min ( E f *( ) f t T ( ) 



7) 



a=i 



< 



{nd, m d) q d 

d n d,m — l ( /m—1 

X 



E n d,™ E E )^ U P A m?n( E %+^"i+l)d+j ( ) f (T+ S m+l)d+J ( ) 

i=i 



j = l s=0 I 060 \i=0 
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Thus, (2.1) follows. □ 

To see the extent of the usefulness of (2.1), consider a stochastic regression 
model of the form 

(2.4) yt = g t (0 ) + e t , t = l,...,n, 

where {et} is a martingale difference sequence with respective to {Gt}, an 
increasing sequence of <7-fields on (f2, J 7 , P), such that 

(2.5) supE(ef \Gt-i) < oo a.s., 

t 

gt(-) is a C/t_i-measurable random function on a compact set ©i C R k and 
#o £ ®i is unknown coefficient vector. The least squares estimate 6 n of Oq 
is obtained by minimizing 

n 

(2.6) s n {e) = Y J {yt-gt{0)f 

t=i 

over 0i. The next theorem provides a set of sufficient conditions under 
which 

(2.7) n\n l i\e n -e w = o{\), q>l. 

To state the result, denote the gradient vector and the Hessian matrix of a 
smooth function h:R h ^-Rby V/i(£i, . ..,£&) = {dh/d^i, . . . , dh/d^k) T and 
V 2 /i(£i, = (d 2 h/d£,id£j)i<ij< k , respectively. For £ R k and r}\ > 0, 

define S m (e) = {^:||^- ©|| <r ?1 }. 

Theorem 2.2. Consider the stochastic regression model (2.4) in which 
gt{ ) is Gt-i-measurable and continuous on ©i and the martingale differ- 
ence sequence {et} satisfies (2.5). Suppose that there exists 5\ > such 
that B^Oq) C ©i and the gradient vector X7gt is continuously differen- 
tiable on B$ 1 (Oo). Moreover, assume sup t E(|ej | T |{?t_i) < C5 a.s. for some 

7 > m&x{q, 2} and C5 > 0, and the following conditions hold: 

(i) (C2)-(C4) hold for Q = B Sl (0 ), f t (0) = Vg t {9) and F t = Gt-i- In 
addition, there exists q\ > q such that 

/ n 11> 

(2.8) max E sup n~ x l 2 V e t (V 2 ft (0)) 4lJ 
i<i,j<k \e eB6l {0 ) ~ 

(2.9) max e( sup \{V 2 g t {6)) id \^ 
\<i,j<k,i<t<n \eeB Sl {e ) 

(2.10) maxEf sup ||V 5t (6>)|| 4gi 

i<t<n \ 0eBsi (0 o ) 
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(ii) For any 82 > such that ®\ — Bs 2 (9q) is nonempty, (C2)-(C4) hold 
for © = ©i - Bs 2 (9o), ft(0) = 9t{0) - 9t{Oo) and F t = Gt-i- In addition, 
there exist < v < 1/2 and q2 > q/{2v) such that 

n 12\ 

-^et(9t(0)-gt(0 )) 



(2.11) El 



sup 
0e&i-Bg 2 (0 o ) 



n 



t=l 



0{n 



(iii) There exists M > such that 



(2.12) Pj 
(2.13) 



A-J n n- 1 £ Vg t (e)(Vg t (0)) T > M )= 0(n 



eeB h (e ) 



t=i 



P\ sup n- 1 J2\\Vg t (0)\\ 2 >M)=O( 
\0eB 6l (0 o ) 



n 



t=i 



(2.14) 
Then (2. 7) holds 



max P sup n-^fV^J^M =0( 



77 



Some comments are in order. Conditions (i) and (iii) are needed to prove 
that the qth. moment of \\n l l 2 (6 n — 0o)||-fA„ is asymptotically bounded in 
(2.15), where A n is the event n falls into a small ball around Oq. Equa- 
tions (2.9) and (2.10) in condition (i) are similar to Condition 13 of [17], but 
(2.9) and (2.10) require the existence of higher-order moments of Vgt(0) 
and V 2 gt{0) to establish inequality (2.26), which plays an important role in 
deriving (2.15). Equation (2.8) in condition (i) can be viewed as a "moment" 
counterpart to (3.18) of [14] and can be justified by an argument similar to 
(3.8) of [14], which shows that the supremum of a Hilbert space (H) val- 
ued martingale is dominated by its norm in H under certain smoothness 
conditions. For more details, see (B.5) and (B.7) of Appendix B. Equations 

(2.12) -(2.14) in condition (iii) may seem less relevant to the typical assump- 
tions made for LLN and CLT of 6 n at the first sight. However, like (2.9) 
and (2.10), they are needed for the derivation of (2.26). In fact, (2.12) and 

(2.13) can be simplified into a single assumption that for any m > 0, 

P sup 

\0<£B Sl (fl ) 

= 0(n-«), 

where ||-D|| 2 = sup|| x || =1 x T D T Z)x for the matrix D. However, we do not 
want to complicate the proof of Theorem 2.2 by using this assumption. 
When gt(9) is a linear process with coefficient functions satisfying certain 



n 



1 ^[V^(0)(V^(0)) T - E{X7g t (0)(X7g t (e)) T }] 



t=i 



> m 



8 



N. H. CHAN AND C.-K. ING 



smoothness conditions, (2.12)-(2.14) can be justified based on a uniform 
version of the first moment bound theorem of Findley and Wei [6] . Further 
details can be found in (B.6) and (B.9)-(B.ll) of Appendix B. In contrast 
to conditions (i) and (iii), condition (ii) is required to prove that the qth 
moment of \\n l / 2 (0 n — #o)||-Tb„ is asymptotically bounded in (2.27), where 
B n denotes the event n falls outside a small ball around Oq. Finally, (C2) 
in condition (ii) provides an identifiability condition for model (2.4), while 
(2.11) is a moment counterpart to (3.14) of [14] and can be analogously 
justified as (2.8). 

Proof of Theorem 2.2. Let < 5* < min{5i, Z^k^M" 2 } and A n = 
{0 n 6 B S *(0 O )}. We first show that 

(2.15) n\\n 1,2 (d n -0 Q Wl An ) = O{l). 

By the mean value theorem for vector- valued functions, on the set A n , 

(2.16) = VS n (O n ) = VS n (0 ) + |7 V 2 S n (0 + r(0 n - O )) dr\ (0 n — Oq), 

where S n (-) is defined in (2.6) and the integral of a matrix is to be under- 
stood component-wise. In view of (2.16) and the identities that V«S n (0) = 
-2£t=i(y* - 9t(e))Vg t (0) and V 2 S n (0) = 2^ =1 V ft (0)(V fl( (0)) T - 
2Zt=i(yt-9t(0)) x V 2 g t (0), one has 

n 

(2.17) J2 £ ^9t(0 ) = (L(On,0 )-Q(On,0 ))(On-Oo) on A n , 
t=l 

where L(0 n ,6>o) = ft Et=i ^9t(0 + r(0 n - ))(Vgt(0 + r(0 n - 6> ))) T dr 
and Q(0 n , ) = Eti&t ~ 9t(0 Q + r(0 n - ))}V 2 gt(0 + r(0 n - )) dr. 
A direct algebraic manipulation leads to 

(2.18) A min (L(0 n ,6>o))> mf A min ( V V g t (0)(V g t (0)) T ) on A n , 

which, together with the continuity of Vgt(0) on Bg 1 (Oo), condition (i) and 
Theorem 2.1, yields that for any s > 1, 

E(A-f 1 (n- 1 L(0 n ,0 o ))^J 

(2.19) 

<E( sup X-tJn- 1 j2^9t(0)(Vg t (0)f))=O(l). 
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With the help of (2.19), we can assume without loss of generality that 
L~ 1 (6 n ,6o) exists on A n , and hence by (2.17), 



\nV 2 (d n -Oo)\\I An 
< \\nL~\d n ,0 ) 



n 



-1/2 



X>v<?t(0 o ; 



t=i 



+ \\nL-\e n ,e Q )\\ 

(2.20) x\\e n -e4i A , 
+ \\nL- l {e n ,o Q )\\ 



„1 n 

/ n- 1 / 2 y]e t V 2 5t (6>o + r(0 n -6>o))dr 
J ° t=i 



J o t=i 

x V 2 5^ (0 o + r(0 n -6>o))dr 



x n 



y\e n - e )\\i Ar: 



where Q* tT satisfies \\Q* tT — #o|| < r \\6n — $o||- By the Cauchy-Schwarz in- 
equality and Jensen's inequality, it follows that 



(2.21) 



and 



J ° t=l 



e t V 2 g t (0 o + r(e n -6 o ))dr 



< k max sup 
l<i,j<k 0eBsi{0o) 



n 



-1/2 



^e t (V 2 g t (9)) K 



t=i 



:=kW n 



1 n 



1 ^r(0 o - e n ) T Vg t (ei r )V 2 g t (e + r(9 n - 9 )) dr 

/ n \ 1/2 



i=l 



(2.22) 



<h\\e n -Oo\\\ sup n^VUVfftWII 2 
\0eB ai (0o) t= i / 

r n 1 1/2 

x max sup n" 1 ^(V 2 ^)) 2 ,- I 

:=A;||0„-0 O ||< 2 < 2 . 



10 



N. H. CHAN AND C.-K. ING 



Denoting sup 0gBii (0o) A m J n (n 1 Ylt=l ^9t{0)(^gt(0)) T ) by R n and combin- 
ing (2.18) and (2.20)-(2.22), we obtain 



\n l ' 2 {e n -e Q )\\n A% 



<3 q < Rl 



(2.23) 



n 



-1/2 



t=i 



:= 3«{(I) + (II) + (HI)}- 

Applying (2.8), (2.10), (2.19), sup t ^{\e t \^\g t -i) < C 5 a.s., Holder's inequal- 
ity and Lemma 2 of Wei [21], it can be shown that for n large and some 
positive constants C{ and C|, 



(2.24) 

and 

(2.25) 



E((I))<C? 



E((II))<C 2 *; 



see Appendix A of Chan and Ing [4] for more details. In addition, by making 
use of (2.9), (2.10) and (2.12)-(2.14), we show in Appendix A that for n 
large, 



(2.26) 



e((iii)) < c* 3 + cm\\n ll \e n - e Q wi An ), 



where C| and C| are some positive constants with C| satisfying < C\ < 
3~ 9 . Consequently, the desired conclusion (2.15) follows from (2.23)-(2.26). 

Letting B n = {0 n G @i = @i — B$*(6o)}, the rest of the proof aims to 
show that 



(2.27) 



n\\n l/ \e n -e,wi Bn ) = o{i) ) 



which, together with (2.15), yields the desired conclusion (2.7). 

Since \\n l l 2 (O n - e )\\ q < C^n^ 2 for some C 5 * > 0, (2.27) follows immedi- 
ately once we can show that 



(2.28) 



P(B n ) = 0(n 



By the continuity of gt(-) on ©i, condition (ii) and Theorem 2.1, one has 
for any s > 1, 



(2.29) 



E< 



inf n-^igtiO) - g t [0 Q )f 



t=i 



O(l). 
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(2.30) 



B n C < 2 sup 
{ ee®i 



n 



^ £t (g t (0)-g t (e )) 



> inf n- i y2(g t (e)-g t (9 )y 
ee& 1 



t=i 



Since q2 > q/(2u), there exists r/i > such that q2 = q(l + i]i)/(2v). By 
(2.29), (2.30), (2.11), Chebyshev's inequality and Holder's inequality, there 
exists Cg > such that for all large n, 

r / n \ -92 /m \ 

P(5 n ) < Cg < E| inf n" 1 ^^^) - 5i (0 o )) 2 i 



I E ( sup 



t=i 



t=i 



92 \ >> i/(i+m) 



0(n 



-9/2^ 



Consequently, (2.28) is established and the theorem is proved. □ 

As mentioned in the Introduction, (2.7) can be used to examine the 
asymptotic properties of MSPE of g n+ i(9 n ), E(y n+ i — g n+ i(0 n )) 2 , which is 
also known as the final prediction error (FPE) for AR models; see Akaike [1]. 
To see this, note first that under certain mild conditions such as (2.2) and 
(2.3) of [14], 6 n — > Oq a.s. If one can further show that 

(2.31) n 1 / 2 (V 5n+ i(0o)) T (^n-^o)^H 
and 

(2.32) n{(Vg n+1 (6 )) T (6 n - 6 )} 2 is uniformly integrable, 

where =^ denotes convergence in distribution and H is a random variable 
with E(H 2 ) < oo, then 



(2.33) 



lim nE{(Vg n+1 (e )) T (e n - O )} 2 = E(H 2 



Once (2.33) is established, it can be linked to E(y n+ i — g n+ i{6 n )) 2 by means 
of Taylor's expansion as follows. Note that 



(2.34) 



n{E(y n+ i - g n+ i(6 n )) 2 - E(e 2 +1 )} 

= nE{(Vg n+1 (e )) T (6 n - O )} 2 + E(R n ) -> E(H 2 ), 



provided the remainder term R n satisfies E(R n )= o(l). While (2.31) can be 
established by means of asymptotic distribution results (see Section 3), (2.7) 
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serves as an important device in establishing (2.32) and E(i? n )= o(l). If one 
further assumes that E(e 2 ) = a 2 > for all t > 0, then (2.34) provides an 
asymptotic expression for E(y n+ i — g n+ i(6 n )) 2 as 

(2.35) E(y n+1 - g n+l {e n )f = a 2 + + o^ 1 ). 

n 

Although the second term in (2.35) is asymptotically negligible compared to 
a 2 , E(H 2 ) becomes a key quantity. Utilizing (2.7), one can make use of the 
asymptotic expression in (2.35), in particular E(H 2 ), to construct optimal 
model selection criteria; see, for example, Akaike [1], Wei [22] and Findley 
and Wei [7]. See, also, Section 3 for further discussions. 

3. Applications to ARMA models. Let yi,...,y n be generated from the 
stochastic regression model, 

(3.1) yt = 9t{Vo) + £ t, t = l,...,n, 

where rj = (ao,i, . . • ,&o, Pl ,Po,i, ■ ■ • ,/3o,p 2 ) T i s an unknown coefficient vector 
and gt{Vo) nas the ARMA representation 

(3.2) g t (Vo) = ato,iyt-i H h a , pi yt- pi - Po,ie t -i /3o, P ^t- P2 

with the initial conditions yt = £t = for all t < 0. Define 

n 

fin = argmin V(y t - g t {v)) 2 , 
veil t=1 



where n C R Pl+P2 is a compact set that includes rf as an interior point and 

%2) 



whose elements r/ = (ai, . . . , a pi , f3±, . . . , f3 P2 ) T satisfy the following proper- 



ties: 



(3.3) 



pi 



A ltV (z) = l-J2a j z^0, 



3=1 
Pi 



A 2 „{z) = 1 ~^2/3jZ j / for all \z\ < 1; 
3=1 

(3.4) Ai tV (z) and A2, v (z) have no common zeros; 

(3.5) \a pi \ + \/3 P2 \>0. 

In this section, we apply the results obtained in Section 2 to show that 

(3.6) E||n 1 / 2 (r 7n -r 7o )r = 0(l), q>l. 

Applications of (3.6) to the investigation of the MSPE of g n+ i(f) n ), E(y n+ i — 
g n+ i(r) n )) 2 , are also given. It should be mentioned that our initial conditions, 
yt = Et = for all t < 0, are made for simplicity of the argument only and all 
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results in this section can be straightforwardly extended to the case where 
(yt,£t) obey the same assumptions for t < as for t > 0. 

Let T] £ n. Define St(rj) = for t < and define St(tf) recursively for t>l 

by 

et(v) = Vt ~ 9t{rj) 

(3.7) =Vt~ aiyt-i a P yt~ Pl 

+ fcet-xiri) + • • • + P P2 e t -p 2 (77), 

noting that £t(iJo) = £ t- As observed in (2.19) and (2.29) of Section 2, to 
obtain (3.6), it is crucial to verify that for some 5\ > with Bg 1 (r] ) C II 
and any s > 1, 



(3.8) 



E< 



sup A m f n 

VEB Sl (rj ) 



n 



■ 1 £ve t (ii)(V e t(t|))' ] 



t=\ 



0(1); 



and for any 82 > with II = II — B,5 2 (779) 7^ and any s > 1, 



(3.9) 



E< sup 
Uefi 



11 



t=i 



O(l). 



Denote the ith component of Vet (77) by (Vet (77)) j. Straightforward calcula- 
tions yield that for 1 < i < pi and 1 < j < P2 5 

(3.10) (V6tfa))i = -wt_i + ^/3 s (Ve 4 _ s (r?)) i , 

s=l 

P2 



(3.11) (V £t (T7)) pi+ i = £t-i(t7) + ^y9 s (V^_ s (r7)) 



8=1 



For j < 0, let 0^(77) = c K *\ri) = and for j > 0, let Cj 1} (r)) and c K *'(ri) 
satisfy 



,(2), 



pi+j- 



,(1), 



,(2), 



(3.12) ^c«(r7) 



V> (2)/ v ,• ^l,fj(«)^2,i,„(«) 



A 2 ,„(*)4i )l|o (*)' 

In view of (3.3)-(3.5) and the compactness of II, there exist positive con- 
stants K\ and K2 such that for all j > and i = 1,2, 



(3.13) 



sup I cf (77) I < Ki exp(-K 2 j). 
T76n 



Define bf{rj) = 0^(77), 1 < Z <pi, and bf 1+l) (r}) = cf^rj), l<l<p 2 . Then 
it follows from (3.10)-(3.13) that 



(3.14) 



't~\ t-i \ T 

ai)^.. V-^ 1 + P2 ) (??)£t _. 

V7=l j=l / 



V et (f|)=(X;65 1) (l)et-i,-,E 6 i 
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and 

(3.15) max sup|6^(r/)| < K[ex.p(— K23) for some K[ > 0. 
i<Kpi+P2 rjen 

Moreover, one has 

(3-16) e t (r]) - e t (r] ) = J2bi(rj)et-i, 

i=i 

where 6^(77), j > 1, satisfy 1 + YljLi b j(v)z j = A 1)V (z)A2, Vo (z)/(A 2 , v (z) x 
A 1)Vo (z)) and 

(3.17) sup 16^(77)1 < K 3 exp{-K 4 j) 

for some positive constants K3 and ^4. The next theorem provides sufficient 
conditions under which 



(3.18) E^supA-f n 



t=i 

This result leads immediately to (3.8). 



n 

t=i 



0(1) for any s > 1. 



Theorem 3.1. Assume model (3.1), with g t (-) defined in (3.2) and St 
being independent random variables satisfying E(ej) = and E(e 2 ) = a 2 for 
all t> 1. Moreover, assume that there exist positive constants ai,£ and M\ 
such that for any < S2 — s± < £, 

(3.19) SUp |iW(s2) " F t,mA S l)\ < M l(s 2 ~ Sl) ai , 
l<m<t<oo, || v||=l 

where v € R m and F t:Tn:V (-) denotes the distribution function of v T (e 4 , . . . , 
£t+i_ m ) T . Then, (C1)'-(C4) hold for = II, f t (0) = Ve t (77) ond^i = <7{e t _i, 

e t _2,...}, i/ie a-field generated by Et-i,£t-2, Hence, by Theorem 2.1, 

(3.18) follows. 

Proof. According to (3.3) and (3.12), it is easy to see that Vet(rj) is 
continuous on II, and hence (CI) follows. Define A = {a:a6 BP, ||a|| = 1}, 
where p = p\ + P2- To show (C2), note first that by (3.3)-(3.5), one has for 
any A 6 A and 77 £ II, there exists 82 = ^(A, 77) > such that for all large t, 

(3.20) £(A T Vet(r7)) 2 > 5 2 . 
In addition, it follows from (3.14) and (3.15) that 

(3.21) E(\ T Vs t (ri)) 2 converges to /(A, 77) uniformly on A x II, 

where l(X, rj) is some nonnegative function on A x II. Moreover, since 
E(A T V£f(T7)) 2 is continuous on A x II, uniform convergence implies that 
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l(X, rj) is also continuous on A x II. By (3.20) and the compactness of A x II, 
infAeA^en ^(A, v) > 0- This, together with (3.21), yields that there is a pos- 
itive number e and a positive integer L such that for all t > L, 

(3.22) inf E(A T Vet(f7)) 2 > e > 0. 
AeA^en 

For t > h > 1, define Ve tjh ( V ) = (£ti b^^et-u . . . , Eti &f (^t-,) T . 
According to (3.14) and (3.15), there exists a positive integer Li(e) such 
that for all £ > l\ > Li(e), 

(3.23) sup |E(A T V^(r7)) 2 -E(A T V£ Ml (r/)) 2 |<e/2. 
AeA,r?en 

From (3.22) and (3.23), it follows that for all t > d x = max{L, Li(e)}, 

(3.24) inf E(A T Ve Ml (T7)) 2 >e/2. 

AGA,T7GiI 

Denote A T (Ve Ml (77) - Ve t (rj)) by fl t (A,ry) and f j- 1 (var(A T Ve Ml (t?))) 1 / 2 
by g t (a,X,rj). Since \ T Ve t ,d 1 {r}) / g t {a, \,rj) can be written as J2f=i c j e t-j 

with EjLi c 2 = 1, (3.19) and (3.24) imply that for any A x r? G A x II and 

t > di, 

P( Sl < \ T Ve t (ri) < s 2 \J c t -di) 



P(si + J2t(A,»|) < A 1 Vet,di(»?) < s 2 + i?t(A,»7)|Ji_ dl 



(3 25) _ p , ^i+^t(A,T7) ^ A T Ve t|dl (77) < s 2 + fit (A, 77 



g t (a,X,ri) g t (a,\,r)) g t (a,\,r)) 

provided < s 2 — si < (£,\/ e /2)/cr. In view of (3.25), (C2) holds with d = d±, 
M = M 1 {cj^f/~e) a \ a = ai and <5 = {£^fej2)/a. 

On the other hand, it is shown in Appendix B that there exists r** > 
such that for any T7 1 ,f7 2 e H, with ||?7 2 — t/]_|| < r**, 

(3-26) ||Ve t (r7 2 ) - Ve^H < ha " *7ill^ 

where i?t are nonnegative random variables satisfying 

(3.27) supE(5 2 ) <oo. 

i>l 

Combining (3.26) and (3.27), we obtain (C3). Finally, the proof is completed 
by noting that (C4) is an immediate consequence of (3.26), (3.27), (3.14), 
(3.15) and the compactness of II. □ 
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Remark 1. In the proof of Theorem 3.1, (3.19) plays the same role as 
that of (C2) in the proof of Theorem 2.1. When e^'s are normally distributed, 
(3.19) is satisfied with M\ = {2-na 2 )~ 1 / 2 , ol\ = 1 and any £ > 0. In addition, 
when £i's are i.i.d. with an integrable characteristic function, (3.19) is satis- 
fied with any £ > 0, ol\ = 1 and some Mi > 0. For more details, see Lemma 4 
of [11]. An extension of Theorem 3.1 to autoregressive fractionally integrated 
moving average models (ARFIMA) has also been obtained by the authors. 
However, since the proof of this extension is quite involved, the details will 
be reported elsewhere. 

Theorem 3.2. Under the same assumptions as in Theorem 3.1, (Cl)- 
(C4) hold for f t (0) = E t (rj) - £ t (Vo) , © = n and T t = a{e t -i,e t -2, ■ ■ ■}, and 
hence by Theorem 2.1, (3.9) follows. 

The proof of Theorem 3.2 is omitted, since it is similar to the proof of The- 
orem 3.1. Using Theorems 2.2, 3.1 and 3.2 and Lemma B.l of Appendix B, 
the next theorem, whose proof is deferred to Appendix B, establishes mo- 
ment bounds for n 1 l 2 {r\ n — rj ). 

Theorem 3.3. Assume that the assumptions of Theorem 3.1 hold and 
for some q\ > q > 1, 

(3.28) supE|e t | 4,?1 <oo. 

t>\ 

Then, (3.6) follows. 

As an application of Theorem 3.3, an asymptotic expression for the MSPE 
of rj n , E{y n+1 - g n +i{Vn)} 2 1 is g iven in Theorem 3.4 below. 

Theorem 3.4. Assume that the assumptions of Theorem 3.1 hold. More- 
over, let St be i.i.d. random variables satisfying for some q\ > 18, 

(3.29) E|ei| 9l <oo. 
Then, 

(3.30) lim n[E{y n+1 - g n +i{v n )} 2 - a 2 ]=pa 2 . 

rn>oo 

Proof. Let 5\ be any positive number such that £^(770) C II and de- 
fine A n = {fi n E B Sl (Vo)} and A c n = {fj n 6 fi = II - B Sl (rj )}. By Taylor's 
theorem, 

n 1/2 (y n +l - 9n+l(Vn) - £ n+l) 

= n 1 / 2 (Ve n+1 (r, )) T (f ln -r l0 )I An 



UNIFORM MOMENT BOUNDS 17 

(3-31) + ^(r) n - r?o) T V 2 e „ +1 (rn(r) n - VoVa* 

+ n 1/2 (e n+ i(r)„) - £ n+1 (r] Q ))I A c 
:= (I) + (II) + (III), 

where \\rj* — rj Q \\ < \\fi n — r} \\. In view of (3.31), (3.30) holds immediately if 
one can show that 



(3.32) lim E(I) = pa 

n—^oo 



2 

= pu 

2 



(3.33) lim E(I17 = 0, 

n— >oo 

(3.34) lim E(III) 2 = 0. 

n— >oo 

By utilizing the martingale CLT (cf. [9]) and a truncation argument 
in [10], it can be shown that 

(3.35) n 1/2 {(Ve n+ i(77 )) T (r) n - rj )}I An F T Q, 

where Q is distributed as N(0, a 2 r _1 ) with T = lim^oo E{Ve t (ri )(Vet{r] )) T }, 
and F, satisfying E(F) = and E(FF T ) = T, is independent of Q. Let 
2 < r < 18/5. Then, it follows from Holder's inequality, Theorem 3.3, (3.15) 
and (3.29) that 

E{|n 1 /2(v en+1 (r 7o )) T (r) n - % )r} 

< n\\n l/2 (iln -»7o)iri|Ve„ + i(T7o)ir} 
<(E||n 1 /2(^_ T?0 )||W4 ) 4/5 (E||V£n+iK)|| 5,- ) i/5 = o(1)) 

which implies the uniform integrability of n{(Ve n+ i(j7 )) T (r7 n — r] )} 2 lA n - 
Combing this with (3.35) yields 

lim E[n{(Ve n+1 (r, )) T (fi n - Vo)} 2 IaJ = E(F T Q) 2 =pa 2 , 

n— >oo 

and hence (3.32) follows. Moreover, applying Theorems 3.2 and 3.3, (3.29) 
and an argument similar to that used to prove (B.8) and (B.12) of Ap- 
pendix B, it is shown in Appendix B of [4] that (3.33) and (3.34) are also 
true. Consequently, the desired conclusion (3.30) holds. □ 

Remark 2. Note that the moment restriction (3.29) is stronger than 
necessary for the proofs of (3.32) and (3.34). On the other hand, since (3.33) 
requires that E||n 1 / 2 (7) — T7g)||' 7 = 0(1) holds with q = 9/2 (see Appendix B 
of [4]), it seems that one cannot easily weaken (3.29) because Theorem 3.3 
constitutes a key tool in verifying (3.33). 

In the special case of p2 = (the pure AR case), equation (3.30) was ex- 
amined by Fuller and Hasza [8], Kunitomo and Yamamoto [13] and Ing [10]. 
In addition, for the case p2 > 0, equation (3.30) was also considered in Ya- 
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mamoto [23], but a rigorous proof of (3.30) is still lacking in the literature. 
By establishing a set of uniform moment bounds, this paper offers a rigorous 
proof of (3.30) for the ARMA case. 

Equation (3.30) implies that when two competing ARMA models are 
entertained, the one having fewer estimated parameters also possesses a 
smaller MSPE, up to terms of order re -1 . As a result, the principle of par- 
simony (e.g., Tukey [20]), which roughly asserts that mathematical models 
with the smallest number of parameters are preferred, is now endowed with 
a precise meaning in the context of ARMA modelling. When p2 = 0, (3.30) 
was established in Akaike [1] using an ad-hoc argument, which immediately 
led him to develop the final prediction error criterion, 

n 

n + P V^V f \\2 



^2(yt -gt(v n )Y 



that is commonly used for AR model selection with optimal prediction ef- 
ficiency; see Shibata [18] or Ing and Wei [12]. Under this perspective, a 
contribution of (3.30) is that it provides a theoretical foundation for the 
construction of the FPE criterion for ARMA models. The issue of whether 
the FPE criterion (or its variants) is asymptotically efficient (in the sense 
of [12] or [18]) in ARMA model selection still remains open, however. 

As a final remark, we note that (3.30) is obtained based on Theorems 2.1 
and 2.2. Moreover, since these theorems provide a useful device for exploring 
the moment properties of least squares estimates in (nonlinear) stochastic re- 
gression models, their applications to prediction or model selection in models 
beyond the ARMA case are anticipated. 

APPENDIX A: PROOFS OF (2.3) AND (2.26) 

PROOF OF (2.3). Let m= {r + 2k)+r + k + 2q} ja\ +1 with l x > q. 
We only prove (2.3) for the case of I = and j = 1 since the other cases can 
be similarly verified. First, define A(u) = {X^o su Pee® l|f(i+i)d+i(0)l| 2 < 
u h/q /r} and B{u) = {Yh=q B (i+i)d+i < u h / q /k 1 / 2 }, where B t are random 
variables defined in (C3). Then, the left-hand side of (2.3) (with / = and 
j = 1) is bounded by 

K + I P{ sup ( m inf i (y T f(m)<m(0)) 2 ^ > "I du 

m-1 } 

inf inf V(y T f (i+ i )d+ i(0)) 2 <«- 1/9 \du 




eee||y||=i . 



m—l 



poo * 

(A.l) <K + / P\ inf inf V (y T f (m)(i+ i(0)) 2 < u x l\ A(u), B(u)\ 
Jk [»ee||y||=i^ j 



du 
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+ / P(A c (u))du+ / P(B c (u))du 




where Kq = Ko(li,6,q,k,T) is a positive number to be specified later and 
A c (u) and B c (u) denote the complements of A(u) and B(u), respectively. 
Since l\ > q, by (C3), (C4) and Chebyshev's inequality, it follows that for n 
large, 



where and C| are some positive constants depending on C%, C2, a, l\,r, k, q 
and Kq. 

To deal with (I), consider the hypersphere S r = {y:y£ R r , ||y|| = 1} and 
the hypercube W(u) = [1 - 2 U - { - h+l ^ 2q {[u^ h+l ^ 2q \ + l),l] r ,ii > 0. Note 
first that S r C H r (u) for any u > 0. Divide H r (u) into sub-hypercubes of 
equal size, each of which has an edge length of 2u~ lyll+1 ^ 2q and a circum- 
scribed circle of radius ^/ru~^ 1+1 ^ 2q . Denote these sub-hypercubes by Bi(u), 
l<i< m * = ([ u ^ +l )/ 2q \ + Letting Gi(u) = S r DBi(u) and {G Vi (u),i = 
1, . . . ,m**} denote the collection of nonempty Gj(n)'s, it follows that S r = 
(J™*i G v .(u) with m** < ([u^ ll+1 ^ 2q \ + l) r . On the other hand, since is a 
bounded subset in R k , there is a positive integer g such that for any u > 0, 
C U k Ju) = [g- 2gu^ h+1 / 2 ^ q ([u ( - h+1 ^/ q \ + l),g] k . We can similarly di- 
vide H^(n) into equal-sized sub-hypercubes Wi(u), i = 1, . . . , e*, where the 
edge length of Wi(u) is 2u~ { - h+l / 2 ^ 1 and e* = ff fe (L^ 1+1 /2)/9j + \)k i n 
addition, it holds that = Ui=i^( n )i where with Jj(u) = n Wi(u), 
{J Vi (u),i = 1, ...,e**} denotes the collection of nonempty Jj(rt)'s. By ob- 
serving 



(A.2) 



(II)<Ci and (III)<C|, 





one has 



(A.3) 
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where 

C^\u) = { inf inf |y T f (l+1)(m (0)| < u^ 2q \ 



sup 
ee® 



%+l)d+l < ^2 , sup ||% + i )d+ i (0)11 < 



Let yj G G Vj (u), j = 1, . . . , m**, and O s G J Vg (u), s = 1, . . . , e**, be arbitrarily- 
chosen. Then, for any y G G Vj (u) and G J Va (u), 

|yjf (m)d+ i(0 s )| < ||yj-y||||f(i+i)d + i(0 s )|| 

+ l|y||||f(i+i)d+i(0s) - f(i+i)d+i(0)ll 

+ |y T f (i+1)d+1 (0)|. 

Combining this with (C3) yields that on the set c\ s '^\u) with u > (2k 1 / 2 / 

T )g/(Zi+l/2) ) 

|yj%+i)d+i(0s)| 

<2^n-^ +1 )^ sup||f (m)d+1 (0)|| 
ee® 

+ 2^-(^V2)/ gjB + i f lyTf 

0eJ t , s («)yGG„ j («) 

and hence 

(A.4) C^\u) C := {|yjf (i+1)d+1 (0 s )| < Sn- 1 ^}. 

In view of (A.3) and (A.4), it follows that for u > {2k 1 / 2 /r) q ^ h+l / 2 \ 



/ m-l > 

P ( mf £ (y T f (m)d+1 (0)) 2 < u~ l / q , A(u),B(u) 



i=0 

(A.5) 

e** m** /m-l 

8=1 j=l V 1=0 

Observe that 



/m-l \ fm-2 

p( n ^ s,i) («)j =e| n i Dt , ]){u) p{D^\{u)\F {m ^ l)d+l ) \, 



together with (C2), implies that for u > (W/5) 2q , all 1 < s < e** , all 1 < j < 



where /„f s ,i, , denotes the indicator function of the set D (u). This, 
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m** and n large, 



(m-1 \ fm-2 \ 

Repeating the same argument m — 1 times, one has 

/ 771—1 \ 



(A.6) P P| Dp 3 >(u) <M m (10) 7 '"" 



^'^fi/.l I < M m (-\ ri\ma„-ma/2q 
\ i=0 / 

Taking K > max{(10/5) 2<z , (2k^ 2 /r) q ^ h+1 ^ , 1}, it follows from (A.5), (A.6) 
and m > {h(r + 2k) + r + k + 2q} /a that 

m— 1 



f*OG I — \ 

co< / EE P ul D { t 3 \u))du 

J Kn 1 .■_ i V n / 



'^o s= i J=1 \ j =0 



(A.7) < 2 r+fc /M m (10) Qm / n ~ {1/(29)}{Qm " (/l+1)r ~ (2Zl+1)fe} (i'u 

Jk 

= 2 r+ VM m (10) Qm {C(g,a,m,/ 1 ,r,A;)}- 1 ^ - C{9 ' Q ' m ' /l ' r ' fc) , 

where C(q,a,m,li,r,k) = {am — (l\ + l)r — (2l\ + l)/c — 2q}/2q. Conse- 
quently, (2.3) is ensured by (A.l), (A.2) and (A.7). □ 

PROOF of (2.26). Let C| = 5\ q k q M 2q . Since 5{, defined at the begin- 
ning of the proof of Theorem 2.2, is smaller than 3 _1 fc _1 M -2 , it follows that 
C\ < 3~ q . By the Cauchy-Schwarz inequality and (2.12)-(2.14), one has 

E(III) < Sf 9 fc%^ 2 E(^< 2 < 2 I {ilnii x/ 2il i/ 2>Jt?2} ) 

+ CZE(\\n 1 / 2 (9 n -0 )\\iI An ) 
< 5f q k q n q/2 {E(Rl q Rl n R q 2n )} 1/2 

(A.8) 

x {P(R n > M) + P(2? lB > A?) + P(P 2 „ > M)} 1 / 2 

+ ciE{\\n l/2 {o n -e )\\ q i An ) 

= 0(1){E( J R 2 "«)}V 2 + C 4 *E(||n 1 / 2 (0 n - 6 )\\ q I An ). 

In addition, E(i? 2<? i?^ n i?2n) = ^(-0 follows from Holder's inequality, (2.9), 
(2.10) and (2.19). Combining this with (A.8) yields (2.26). □ 



APPENDIX B: PROOFS OF (3.26), (3.27) AND THEOREM 3.3 

Throughout this Appendix, J(m,p), 1 < m < p, denotes the set {(ji, ■ ■■ , 
3m) -31 < • • • < jmji G {1, • • • ,P} for 1 < i < m}, and for j = (ji, . . . , j m ) G 
3(m,p) and smooth function u; = = . . . ,£p), Djw denotes the 
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partial derivative d m w/d^j 1 , . . . ,d£j m . Before proving (3.26) and (3.27), we 
note that according to (3.3)-(3.5), (3.10)-(3.14) and the compactness of II, 
(V 2 £t (r?)) M =£< =i c s,ij( r l) £ t-i-s, where c Si ij(rj) are continuously differen- 
tiable on II and satisfy, for some D\,D 2 > (independent of i,j and s), 

(B.l) sup \c Sjij (r})\ < D 1 exp(-D 2 s). 

Moreover, there exists a small positive number r* such that 
(B.2) sup |Dj6 a (r/)| < D 3 exp{-D e s), 

(B.3) max sup |Dj6®(r/)| < D 4 exp(-D G s), 

jeJ(m,p),l<m<p ve n* 

(B.4) max sup \DjC Stij (rj)\ < D 5 exp(-L> 6 s), 

jeJ(m,p),l<m<p^gn* 

where IT* = Urjen ^* ( r ?) anc ^ ^3>- •• >^6 are some positive constants inde- 
pendent of i,j,l and s. 



Proofs of (3.26) and (3.27). Let r** = r*/2. For \\rj 2 - < T ** \ 
it follows from the mean value theorem for vector-valued functions that 
l|Ve t (r/ 2 ) - Vetirj^W 2 < ||»/ 2 - rj^] 2 ]] V 2 e t (vt + v{rj 2 - r) x )) dv\\ 2 < \\n 2 - 
Vi\\ 2 (Bt) 2 , where B t = {Ex^pSup^n** (V 2 £ t (^))^} 1/2 , with IT** = 
Ur J Gn^**( 7 ?)- Denoting by II** the compact closure of II**, one has II** C 
IT, which further yields IP* C Ur=i B T * (0 r ), for some 1 < f < oo and 0\, . . . , 
r - e n. Hence, E(5 t 2 ) < Ei<ij<pEr=i E{sup tj6Br . (j , r) (V 2 e t (i ? ))?,.}. More- 
over, it follows from (B.l), (B.4) and (3.10) of Lai [14] that for all 1 < i, j <p, 
1 < r < r and t > 3, E{ S up T , 6 ^ (0r) (V 2 £t (T 7 ))? J .} < C E^i{exp(-2D 2S ) + 
exp(— 2Z?6s)} for some C > (see Appendix B of [4] for more details). Con- 
sequently, (3.26) and (3.27) follow. □ 

The next lemma, Lemma B.l, provides moment bounds for the supre- 
mums of some random functions associated with (2.8) and (2.11)-(2.14). 
Lemma B.l, together with Theorems 3.1 and 3.2, constitutes the major 
tools for proving Theorem 3.3. 

Lemma B.l. Let a be some point in R k ,k > 1, and 5\ be some positive 
number. Fort>2, define K t {6) = E;=i c i(#) e t-i an d Qt{0) = E*=i di{0)e t -i, 
where ej are independent random variables with E(ej) = and E(e 2 ) = a 2 > 
for all i>\, and Ci{6) and di(0) are real-valued functions on Bs x (0 a ). As- 
sume that for any i > 1, j G J(m, k) and 1 < m < k, DjCj(0) are continu- 
ous on B$ 1 (0 a ), and for some q\ > 2, sup^ E|ei| 91 < oo. Then, there exists 
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C > such that for all n > 2, 



E sup 

\0&B Sl (8 a ) 



t=2 



(B.5) < CW 2 



'n-l 



9i/2 



{n-l 
max sup (DjCj(#)) 2 

^j€J(m,fc),l<m<A!e eBii (e a ) 



Moreover, if for any i,j>l,j& 3(m, k) and 1 < m < k, T)j{ci(0)dj(0)} are 
continuous on Bs 1 (0o), and for some q\ > 2, sup^ E|ej| 2lJ1 < oo, then there 
exists C > such that for all n>2>, 



E sup 

\e&B 5l (6 a ) 



J2K t (0)Q t (0)-E(K t (0)Q t (0)) 



<C 



t=2 
n-l /n—j 



-1 /n—j \ 2 \ 9i/ 2 



=i \z=i / j=i \z=i / J 

n-l C /J— 1 /n—j \ 2 \ <?i/ 2 



+ n 



(gi-2)/2 



(B.6) 



/j-l /n-j \ 2\ gi/2 

+fefe s H) 

/j-l /n-j \ 2\ qi/2 

/j-l /n-j \ 2x gi/2- 



Vi,j = \ci(0 a )dj(0 a )\ and Sij 



jGJ(m,fc),l<m<fc 0GBa 1 (0 a ) 



The proof of (B.5), given in Appendix B of [4], is based on (3.8) of [14] and 
Lemma 2 of [21]. Assuming that sup^ E|e^ | gi < oo for some q± > maxjgr, 2} 
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with q>l, (B.5) can be used to justify (2.8) for the ARMA case. More pre- 
cisely, applying (B.5) with K t {6) = (V 2 £t(^?))i,j and et = £f, in conjunction 
with (B.l) and (B.4), it follows that for any S\ > with B$ 1 (r} ) C n, 

n <?i\ 



(B.7) max E 

l<i,j<p ' 



sup 



n 



-1/2 



^(V 2 ^))^- 



t=i 



O(l). 



In addition, by making use of (B.5) with K t {6) = Et(r]) — £t(rio) and et = £t, 
the compactness of n, (3.17) and (B.2), we obtain 

n 9i\ 



(B. 



E sup 
\r?efi 



n 1 ^2e t (e t (ri)-£t(Vo)) 



t=i 



which gives (2.11) (with q 2 = qi and v = 1/2) for the ARMA case. 

On the other hand, (B.6), whose proof is also given in Appendix B of [4], 
can be viewed as a uniform version of the first moment bound theorem of [6] 
and plays a key role in verifying (2.12)-(2.14) for the ARMA case. Let M3 
be any positive number larger than 2Dfa 2 Y^i^i ex P( — 2.D2O and 8\ be any 
positive number satisfying Bg 1 (rj Q ) C 14, noting that D\ and D2 are defined 
in (B.l). Assume sup i>1 E|ej| 2<?1 < 00 for some q± > 2q with q > 1. Then, 
by (B.6) with K t (G) = Q t (0) = (V 2 £ t (r,)hj and e t = e u (B.l), (B.4) and 
Chebyshev's inequality, one has for any 1 <i,j <p, 

P[ sup n- 1 £(V 2 e t (ri))?j>M 3 } 

\n€B Sl (rio) t=l J 



(B.9) <P\ 



sup 

-?l/2l 



n 



n 11 > 

1 Duetto))?,- " E{(V 2 ^(r 7 )) 2 J }] > (M 3 /2) 91 
t=i J 

0{n~ qi ^) = 0{n^ q ), 

which is (2.14) for the ARMA case. In addition, (2.12) and (2.13) for the 
ARMA case, that is, for some Mi,M 2 > 0, 

/ / n \ \ 



(B.10)P sup A 



-1 



n 



- 1 Y < Vzt(v)(Vet(v)) T )>Mi) =0(n 



t=i 



(B.ll) 



sup n-^HVe^r?)!! 2 >M 2 

^V£B Sl (tj ) 



0{n 



t=i 



can also be similarly verified. With the help of these results, we are now in 
a position to prove Theorem 3.3. 



Proof of Theorem 3.3. Since (3.28) is assumed, (B.7)-(B.ll) follow. 
In view of Theorems 2.2, 3.1 and 3.2, it remains to show that for some 
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qi > q > 1 and some small positive number 5\ with B$ 1 (rj) C II, 



(B.12) max E sup \(V 2 e t ( V )) h:j \^) = 0(1), 

and 

(B.13) max e( sup \\Ve t (r])\\ iqi ) = 0(1). 



l<i<n VB jl(l)0 ) 

These equations, however, can be verified based on (3.15), (3.28), (B.l), 
(B.3), (B.4) and an argument similar to (3.10) of [14]. The details are thus 
omitted here. □ 
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